Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support --empty flag for schema-only dry runs #8971

Merged
merged 16 commits into from
Nov 29, 2023
Merged

Support --empty flag for schema-only dry runs #8971

merged 16 commits into from
Nov 29, 2023

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Nov 1, 2023

resolves #8980

Problem

Solution

  • BaseRelation changes
    • Adds an optional limit to BaseRelation
    • Implements BaseRelation.render_limited: depending on the value of self.limit, this method templates sql to wrap a rendered relation in an 'empty' select statement. Further reasoning in the spike report
    • Updates BaseRelation.__str__ to use self.render_limited if self.limit is set
  • Adds --empty flag to run and build commands.
    • This flag configuration is used to determine the value for resolve_limit on the BaseResolver
    • BaseResolver plumbs its resolve_limit to the Relation instance it is creating, which is responsible for rendering itself given an optional limit parameter.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@cla-bot cla-bot bot added the cla:yes label Nov 1, 2023
Copy link
Contributor

github-actions bot commented Nov 1, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

This comment was marked as outdated.

@MichelleArk MichelleArk changed the title [spike] --dry-run flag [spike] --empty flag Nov 1, 2023
@MichelleArk
Copy link
Contributor Author

MichelleArk commented Nov 1, 2023

Spike Report

The crux of this issue is in resolving ref and source calls to a subquery that wraps the relation identifier to limit its read to 0 rows. We'd want to do this in a way that is overridable by adapter, because each adapter may have a slightly different syntax for achieving this. There are a couple options I've identified:

  1. Try to make use of the existing adapter-dispatchable get_empty_subquery_sql macro, which consolidates this adapter-specific syntax.
    • However, relation rendering (Relation.render) currently occurs without a jinja context, and it would be pretty circular to provide a context to a Relation since the ref and source resolvers themselves are part the context. So the only way I can imagine doing this would be to override ref and source macros in the global_project, but this adds a good amount of complexity and will be cumbersome to maintain (changes to ref/source will need to consider the override, potentially making changes across both)
  2. Implement a default empty_subquery_template in python on the BaseRelation, which is overridable across adapters (albeit in python).
    • This has the downside of duplication across the get_empty_subquery_sql macro and the python template. But on the flip side, there should only be one correct way of writing this template so it doesn't need to be all the way in user-space (jinja), as long as its overwritable by adapter maintainers (in their BaseRelation subclass). I've spiked this as a class attribute but could easily modify it to be a method on BaseRelation if thats preferred.

I've opted for (2) in this spike and would recommend it for its relative simplicity and given there is some precedence for doing a small amount of sql templating in the adapter python module.

@MichelleArk MichelleArk self-assigned this Nov 7, 2023
@MichelleArk MichelleArk changed the title [spike] --empty flag [spike] Support --empty flag for schema-only dry runs Nov 7, 2023
@aranke aranke self-requested a review November 15, 2023 17:13
@MichelleArk MichelleArk changed the title [spike] Support --empty flag for schema-only dry runs Support --empty flag for schema-only dry runs Nov 15, 2023
Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple questions but generally LGTM

core/dbt/adapters/base/relation.py Show resolved Hide resolved
core/dbt/cli/params.py Show resolved Hide resolved
Copy link
Member

@aranke aranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using a limit of sys.maxsize instead of None make the code a little cleaner?

Then we might not even need resolve_limit.

core/dbt/adapters/base/relation.py Outdated Show resolved Hide resolved
@MichelleArk
Copy link
Contributor Author

MichelleArk commented Nov 21, 2023

Does using a limit of sys.maxsize instead of None make the code a little cleaner?

I can see how that could clean this up but it would mean that every rendered relation would be wrapped in a (select * limit max_size) - whether we are using the --empty flag or not (we could add some handling for this but then the code doesn't get much cleaner). I think this functionality should be limited to when --empty is provided to avoid changing the compiled sql under the hood on existing surface area (all ref/source calls) to keep the risk of this change low.

sys.maxsize would also be less predictable from a user-perspective, so I don't think it's worth the more elegant code path here

Copy link
Member

@aranke aranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, 🚢

@MichelleArk MichelleArk merged commit 5488dfb into main Nov 29, 2023
50 checks passed
@MichelleArk MichelleArk deleted the dry-run-flag branch November 29, 2023 16:06
@mwstanleyft
Copy link

Adapters need to be provided the ability to override the SQL statement here at minimum, because using a LIMIT clause in this way in BigQuery means that users will still be billed the full cost as if they had run their model for real and they will be very confused by this. BigQuery does provide some facility to avoid this but it would require either overriding the SQL statement here, or overriding the behaviour of --empty completely.

@github-christophe-oudar
Copy link

I just happened to land that PR randomly looking at the latest commits and I think I've an interesting opened discussion that correlate with your intent here:
#8560

That feature is a great step toward easily dry run.
I've set up a similar custom approach with 2 macros in my codebase

{% macro limit_ci() -%}
{% if target.name == 'ci' -%}
LIMIT 0
{%- endif %}
{%- endmacro %}

{% macro limit_dev(rows = 100) -%}
{% if target.name == 'dev' -%}
LIMIT {{rows}}
{%- endif %}
{%- endmacro %}

Yet it's very manual (but working fairly well because developers love copy pasting existing models).

However I would like to do the following on CI:

  • Run a dry run (with that empty approach)
  • Run models that have unit tests

It feels like to do so properly, having render method that is overridable through a macro would open to a lot of custom solutions 🙌

@MichelleArk MichelleArk added the user docs [docs.getdbt.com] Needs better documentation label Dec 3, 2023
@FishtownBuildBot
Copy link
Collaborator

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#4572

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 3, 2023

@github-christophe-oudar That's exactly the motivation behind this --empty feature: to support unit testing in CI, without building any "real" data. We do see extensions for limiting/sampling data in development as well — check out this issue:

@mwstanleyft In my experience, while limit 0 is not sufficient on BigQuery, where false limit 0 avoids scanning any data and billing for bytes processed. There have been a one or two edge cases where this doesn't work (Christophe has found a few of them!), but these have tended to be bugs on BigQuery's side, and generally within more complex DML statements, not simple select / CTA.

@MichelleArk
Copy link
Contributor Author

@mwstanleyft -- thanks for highlighting this issue for adapters in general and BigQuery specifically. For BigQuery, I believe the default implementation of render_limited in this PR for --empty should be cost-effective because of the where false clause that is included in the limit 0 case of the sql generated in render_limited:

No limit => full scan
Screenshot 2023-12-03 at 8 28 48 PM
Limit => full scan
Screenshot 2023-12-03 at 8 29 02 PM
Limit + where false => no scan
Screenshot 2023-12-03 at 8 29 14 PM

Actually - it seems like BigQuery scans 0 bytes if a limit 0 is provided:
Screenshot 2023-12-03 at 8 38 38 PM

Regardless though - adapter implementations can overwrite the default behaviour (in python, not in user-space) of render_limited to modify/opt-out of this behaviour.

We'll definitely keep this issue in mind as we continue to refine sampling / limiting beyond this 'empty' use case.

colin-rogers-dbt added a commit that referenced this pull request Dec 6, 2023
* moving types_pb2.py to common/events

* Restore warning on unpinned git packages (#9157)

* Support --empty flag for schema-only dry runs (#8971)

* Fix ensuring we produce valid jsonschema artifacts for manifest, catalog, sources, and run-results (#9155)

* Drop `all_refs=True` from jsonschema-ization build process

Passing `all_refs=True` makes it so that Everything is a ref, even
the top level schema. In jsonschema land, this essentially makes the
produced artifact not a full schema, but a fractal object to be included
in a schema. Thus when `$id` is passed in, jsonschema tools blow up
because `$id` is for identifying a schema, which we explicitly weren't
creating. The alternative was to drop the inclusion of `$id`. Howver, we're
intending to create a schema, and having an `$id` is recommended best
practice. Additionally since we were intending to create a schema,
not a fractal, it seemed best to create to full schema.

* Explicity produce jsonschemas using DRAFT_2020_12 dialect

Previously were were implicitly using the `DRAFT_2020_12` dialect through
mashumaro. It felt wise to begin explicitly specifying this. First, it
is closest in available mashumaro provided dialects to what we produced
pre 1.7. Secondly, if mashumaro changes its default for whatever reason
(say a new dialect is added, and mashumaro moves to that), we don't want
to automatically inherit that.

* Bump manifest version to v12

Core 1.7 released with manifest v11, and we don't want to be overriding
that with 1.8. It'd be weird for 1.7 and 1.8 to both have v11 manifests,
but for them to be different, right?

* Begin including schema dialect specification in produced jsonschema

In jsonschema's documentation they state
> It's not always easy to tell which draft a JSON Schema is using.
> You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to.
> It's generally good practice to include it, though it is not required.

and

> For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world.

Basically, to know how to parse a schema, it's important to include what
schema dialect is being used for the schema specification. The change in
this commit ensures we include that information.

* Create manifest v12 jsonschema specification

* Add change documentation for jsonschema schema production fix

* Bump run-results version to v6

* Generate new v6 run-results jsonschema

* Regenerate catalog v1 and sources v3 with fixed jsonschema production

* Update tests to handle bumped versions of manifest and run-results

---------

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Co-authored-by: Quigley Malcolm <QMalcolm@users.noreply.github.com>
colin-rogers-dbt added a commit that referenced this pull request Dec 22, 2023
* remove dbt.contracts.connection imports from adapter module

* Move events to common (#8676)

* Move events to common

* More Type Annotations (#8536)

* Extend use of type annotations in the events module.

* Add return type of None to more __init__ definitions.

* Still more type annotations adding -> None to __init__

* Tweak per review

* Allow adapters to include python package logging in dbt logs (#8643)

* add set_package_log_level functionality

* set package handler

* set package handler

* add logging about stting up logging

* test event log handler

* add event log handler

* add event log level

* rename package and add unit tests

* revert logfile config change

* cleanup and add code comments

* add changie

* swap function for dict

* add additional unit tests

* fix unit test

* update README and protos

* fix formatting

* update precommit

---------

Co-authored-by: Peter Webb <peter.webb@dbtlabs.com>

* fix import

* move types_pb2.py from events to common/events

* move agate_helper into common

* Add utils module (#8910)

* moving types_pb2.py to common/events

* split out utils into core/common/adapters

* add changie

* remove usage of dbt.config.PartialProject from dbt/adapters (#8909)

* remove usage of dbt.config.PartialProject from dbt/adapters

* add changie

---------

Co-authored-by: Colin <colin.rogers@dbtlabs.com>

* move agate_helper unit tests under tests/unit/common

* move agate_helper into common (#8911)

* move agate_helper into common

* add changie

---------

Co-authored-by: Colin <colin.rogers@dbtlabs.com>

* remove dbt.flags.MP_CONTEXT usage in dbt/adapters (#8931)

* remove dbt.flags.LOG_CACHE_EVENTS usage in dbt/adapters (#8933)

* Refactor Base Exceptions (#8989)

* moving types_pb2.py to common/events

* Refactor Base Exceptions

* update make_log_dir_if_missing to handle str

* move remaining adapters exception imports to common/adapters
---------

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>

* Remove usage of dbt.deprecations in dbt/adapters, enable core & adapter-specific (#9051)

* Decouple adapter constraints from core (#9054)

* Move constraints to dbt.common

* Move constraints to contracts folder, per review

* Add a changelog entry.

* move include/global_project to adapters (#8930)

* remove adapter.get_compiler (#9134)

* Move adapter logger to adapters (#9165)

* moving types_pb2.py to common/events

* Move AdapterLogger to adapter folder

* add changie

* delete accidentally merged types_pb2.py

* Move the semver package to common and alter references. (#9166)

* Move the semver package to common and alter references.

* Alter leftover references to dbt.semver, this time using from syntax.

---------

Co-authored-by: Mila Page <versusfacit@users.noreply.github.com>

* Refactor EventManager setup and interaction (#9180)

* moving types_pb2.py to common/events

* move event manager setup back to core, remove ref to global EVENT_MANAGER and clean up event manager functions

* move invocation_id from events to first class common concept

* move lowercase utils to common

* move lowercase utils to common

* ref CAPTURE_STREAM through method

* add changie

* first pass: adapter migration script (#9160)

* Decouple macro generator from adapters (#9149)

* Remove usage of dbt.contracts.relation in dbt/adapters (#9207)

* Remove ResultNode usage from connections (#9211)

* Add RelationConfig Protocol for use in Relation.create_from (#9210)

* move relation contract to dbt.adapters

* changelog entry

* first pass: clean up relation.create_from

* type ignores

* type ignore

* changelog entry

* update RelationConfig variable names

* Merge main into feature/decouple-adapters-from-core (#9240)

* moving types_pb2.py to common/events

* Restore warning on unpinned git packages (#9157)

* Support --empty flag for schema-only dry runs (#8971)

* Fix ensuring we produce valid jsonschema artifacts for manifest, catalog, sources, and run-results (#9155)

* Drop `all_refs=True` from jsonschema-ization build process

Passing `all_refs=True` makes it so that Everything is a ref, even
the top level schema. In jsonschema land, this essentially makes the
produced artifact not a full schema, but a fractal object to be included
in a schema. Thus when `$id` is passed in, jsonschema tools blow up
because `$id` is for identifying a schema, which we explicitly weren't
creating. The alternative was to drop the inclusion of `$id`. Howver, we're
intending to create a schema, and having an `$id` is recommended best
practice. Additionally since we were intending to create a schema,
not a fractal, it seemed best to create to full schema.

* Explicity produce jsonschemas using DRAFT_2020_12 dialect

Previously were were implicitly using the `DRAFT_2020_12` dialect through
mashumaro. It felt wise to begin explicitly specifying this. First, it
is closest in available mashumaro provided dialects to what we produced
pre 1.7. Secondly, if mashumaro changes its default for whatever reason
(say a new dialect is added, and mashumaro moves to that), we don't want
to automatically inherit that.

* Bump manifest version to v12

Core 1.7 released with manifest v11, and we don't want to be overriding
that with 1.8. It'd be weird for 1.7 and 1.8 to both have v11 manifests,
but for them to be different, right?

* Begin including schema dialect specification in produced jsonschema

In jsonschema's documentation they state
> It's not always easy to tell which draft a JSON Schema is using.
> You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to.
> It's generally good practice to include it, though it is not required.

and

> For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world.

Basically, to know how to parse a schema, it's important to include what
schema dialect is being used for the schema specification. The change in
this commit ensures we include that information.

* Create manifest v12 jsonschema specification

* Add change documentation for jsonschema schema production fix

* Bump run-results version to v6

* Generate new v6 run-results jsonschema

* Regenerate catalog v1 and sources v3 with fixed jsonschema production

* Update tests to handle bumped versions of manifest and run-results

---------

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Co-authored-by: Quigley Malcolm <QMalcolm@users.noreply.github.com>

* Move BaseConfig to Common (#9224)

* moving types_pb2.py to common/events

* move BaseConfig and assorted dependencies to common

* move ShowBehavior and OnConfigurationChange to common

* add changie

* Remove manifest from catalog and connection method signatures (#9242)

* Add MacroResolverProtocol, remove lazy loading of manifest in adapter.execute_macro (#9243)

* remove manifest from adapter.execute_macro, replace with MacroResolver + remove lazy loading

* rename to MacroResolverProtocol

* pass MacroResolverProtcol in adapter.calculate_freshness_from_metadata

* changelog entry

* fix adapter.calculate_freshness call

* pass context to MacroQueryStringSetter (#9248)

* moving types_pb2.py to common/events

* remove manifest from adapter.execute_macro, replace with MacroResolver + remove lazy loading

* rename to MacroResolverProtocol

* pass MacroResolverProtcol in adapter.calculate_freshness_from_metadata

* changelog entry

* fix adapter.calculate_freshness call

* pass context to MacroQueryStringSetter

* changelog entry

---------

Co-authored-by: Colin <colin.rogers@dbtlabs.com>

* add macro_context_generator on adapter (#9251)

* moving types_pb2.py to common/events

* remove manifest from adapter.execute_macro, replace with MacroResolver + remove lazy loading

* rename to MacroResolverProtocol

* pass MacroResolverProtcol in adapter.calculate_freshness_from_metadata

* changelog entry

* fix adapter.calculate_freshness call

* add macro_context_generator on adapter

* fix adapter test setup

* changelog entry

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Add typing for macro_context_generator, fix query_header_context

---------

Co-authored-by: Colin <colin.rogers@dbtlabs.com>
Co-authored-by: William Deng <33618746+WilliamDee@users.noreply.github.com>

* Pass mp_context to adapter factory (#9275)

* moving types_pb2.py to common/events

* require core to pass mp_context to adapter factory

* add changie

* fix SpawnContext annotation

* Fix include for decoupling (#9286)

* moving types_pb2.py to common/events

* fix include path in MANIFEST.in

* Fix include for decoupling (#9288)

* moving types_pb2.py to common/events

* fix include path in MANIFEST.in

* add index.html to in MANIFEST.in

* move system client to common (#9294)

* moving types_pb2.py to common/events

* move system.py to common

* add changie update README

* remove dbt.utils from semver.py

* remove aliasing connection_exception_retry

* Update materialized views to use RelationConfigs and remove refs to dbt.utils (#9291)

* moving types_pb2.py to common/events

* add AdapterRuntimeConfig protocol and clean up dbt-postgress core imports

* add changie

* remove AdapterRuntimeConfig

* update changelog

* Add config field to RelationConfig (#9300)

* moving types_pb2.py to common/events

* add config field to RelationConfig

* merge main into feature/decouple-adapters-from-core (#9305)

* moving types_pb2.py to common/events

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Remove `--dry-run` flag from `dbt deps` (#9169)

* Rm --dry-run flag for dbt deps

* Add changelog entry

* Update test

* PR feedback

* adding clean_up methods to basic and unique_id tests (#9195)

* init attempt of adding clean_up methods to basic and unique_id tests

* swapping cleanup method drop of test_schema to unique_schema to test breakage on docs_generate test

* moving the clean_up method down into class BaseDocsGenerate

* remove drop relation for unique_schema

* manually define alternate_schema for clean_up as not being seen as part of project_config

* add changelog

* remove unneeded changelog

* uncomment line that generates new manifest and delete manifest our changes created

* make sure the manifest test is deleted and readd older version of manifest.json to appease test

* manually revert file to previous commit

* Revert "manually revert file to previous commit"

This reverts commit a755419.

---------

Co-authored-by: William Deng <33618746+WilliamDee@users.noreply.github.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com>

* resolve merge conflict on unparsed.py (#9309)

* moving types_pb2.py to common/events

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Remove `--dry-run` flag from `dbt deps` (#9169)

* Rm --dry-run flag for dbt deps

* Add changelog entry

* Update test

* PR feedback

* adding clean_up methods to basic and unique_id tests (#9195)

* init attempt of adding clean_up methods to basic and unique_id tests

* swapping cleanup method drop of test_schema to unique_schema to test breakage on docs_generate test

* moving the clean_up method down into class BaseDocsGenerate

* remove drop relation for unique_schema

* manually define alternate_schema for clean_up as not being seen as part of project_config

* add changelog

* remove unneeded changelog

* uncomment line that generates new manifest and delete manifest our changes created

* make sure the manifest test is deleted and readd older version of manifest.json to appease test

* manually revert file to previous commit

* Revert "manually revert file to previous commit"

This reverts commit a755419.

---------

Co-authored-by: William Deng <33618746+WilliamDee@users.noreply.github.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com>

* Resolve unparsed.py conflict (#9311)

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Remove `--dry-run` flag from `dbt deps` (#9169)

* Rm --dry-run flag for dbt deps

* Add changelog entry

* Update test

* PR feedback

* adding clean_up methods to basic and unique_id tests (#9195)

* init attempt of adding clean_up methods to basic and unique_id tests

* swapping cleanup method drop of test_schema to unique_schema to test breakage on docs_generate test

* moving the clean_up method down into class BaseDocsGenerate

* remove drop relation for unique_schema

* manually define alternate_schema for clean_up as not being seen as part of project_config

* add changelog

* remove unneeded changelog

* uncomment line that generates new manifest and delete manifest our changes created

* make sure the manifest test is deleted and readd older version of manifest.json to appease test

* manually revert file to previous commit

* Revert "manually revert file to previous commit"

This reverts commit a755419.

---------

Co-authored-by: William Deng <33618746+WilliamDee@users.noreply.github.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com>

---------

Co-authored-by: colin-rogers-dbt <111200756+colin-rogers-dbt@users.noreply.github.com>
Co-authored-by: Peter Webb <peter.webb@dbtlabs.com>
Co-authored-by: Colin <colin.rogers@dbtlabs.com>
Co-authored-by: Mila Page <67295367+VersusFacit@users.noreply.github.com>
Co-authored-by: Mila Page <versusfacit@users.noreply.github.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Quigley Malcolm <QMalcolm@users.noreply.github.com>
Co-authored-by: William Deng <33618746+WilliamDee@users.noreply.github.com>
Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com>
Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
@github-christophe-oudar
Copy link

I finally tried this feature and found one major caveat:
I suspect, like me, a lot of dbt users are using FROM {{ source('schema', 'table') }} my_table_alias
Then using --empty will result in
FROM (select * from `db`.`schema`.`table` where false limit 0) _dbt_limit_subq my_table_alias
Which obviously won't work.

There are at least few solutions that could be explored to solve that problem:

  • update source macro to support an alias parameter
  • update source macro to support adding or not a default alias _dbt_limit_subq
  • set no alias (but then I think some DBs requires an alias and then it would fail for them with default setup)

@seub
Copy link
Contributor

seub commented May 28, 2024

I finally tried this feature and found one major caveat: I suspect, like me, a lot of dbt users are using FROM {{ source('schema', 'table') }} my_table_alias Then using --empty will result in FROM (select * from `db`.`schema`.`table` where false limit 0) _dbt_limit_subq my_table_alias Which obviously won't work.

Hey @github-christophe-oudar, I'm curious about your issue but I'm not following, can you clarify? Why would --empty add _dbt_limit_subq in your query?

@github-christophe-oudar

@seub it's because of dbt-labs/dbt-adapters#124, it's fixed since then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
8 participants